Goto

Collaborating Authors

 universal approximator


The Expressive Power of Neural Networks: A View from the Width

Neural Information Processing Systems

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g.



Dense Neural Networks are not Universal Approximators

Rauchwerger, Levi, Jegelka, Stefanie, Levie, Ron

arXiv.org Machine Learning

We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary continuous functions if there are no restrictions on the weight values, we show that dense neural networks do not possess this universality. Our argument is based on a model compression approach, combining the weak regularity lemma with an interpretation of feedforward networks as message passing graph neural networks. We consider ReLU neural networks subject to natural constraints on weights and input and output dimensions, which model a notion of dense connectivity. Within this setting, we demonstrate the existence of Lipschitz continuous functions that cannot be approximated by such networks. This highlights intrinsic limitations of neural networks with dense layers and motivates the use of sparse connectivity as a necessary ingredient for achieving true universality.


Inverse M-Kernels for Linear Universal Approximators of Non-Negative Functions

Neural Information Processing Systems

Kernel methods are widely utilized in machine learning field to learn, from training data, a latent function in a reproducing kernel Hilbert space. It is well known that the approximator thus obtained usually achieves a linear representation, which brings various computational benefits, while maintaining great representation power (i.e., universal approximation). However, when non-negativity constraints are imposed on the function's outputs, the literature usually takes the kernel method-based approximators as offering linear representations at the expense of limited model flexibility or good representation power by allowing for their nonlinear forms. The main contribution of this paper is to derive a sufficient condition for a positive definite kernel so that it may construct flexible and linear approximators of non-negative functions. We call a kernel function that offers these attributes an inverse M-kernel; it is a generalization of the inverse M-matrix. Furthermore, we show that for a one-dimensional input space, universal exponential/Abel kernels are inverse M-kernels and construct linear universal approxima-tors of non-negative functions. To the best of our knowledge, it is the first time that the existence of linear universal approximators of non-negative functions has been elucidated. We confirm the effectiveness of our results by experiments on the problems of non-negativity-constrained regression, density estimation, and intensity estimation. Finally, we discuss issues and perspectives on multi-dimensional input settings.




State-space models with layer-wise nonlinearity are universal approximators with exponential decaying memory

Neural Information Processing Systems

State-space models have gained popularity in sequence modelling due to their simple and efficient network structures. However, the absence of nonlinear activation along the temporal direction limits the model's capacity. In this paper, we prove that stacking state-space models with layer-wise nonlinear activation is sufficient to approximate any continuous sequence-to-sequence relationship. Our findings demonstrate that the addition of layer-wise nonlinear activation enhances the model's capacity to learn complex sequence patterns. Meanwhile, it can be seen both theoretically and empirically that the state-space models do not fundamentally resolve the issue of exponential decaying memory. Theoretical results are justified by numerical verifications.


Universal Representation of Generalized Convex Functions and their Gradients

Nehzati, Moeen

arXiv.org Artificial Intelligence

A wide range of optimization problems can often be written in terms of generalized convex functions (GCFs). When this structure is present, it can convert certain nested bilevel objectives into single-level problems amenable to standard first-order optimization methods. We provide a new differentiable layer with a convex parameter space and show (Theorems 5.1 and 5.2) that it and its gradient are universal approximators for GCFs and their gradients. We demonstrate how this parameterization can be leveraged in practice by (i) learning optimal transport maps with general cost functions and (ii) learning optimal auctions of multiple goods. In both these cases, we show how our layer can be used to convert the existing bilevel or min-max formulations into single-level problems that can be solved efficiently with first-order methods.


Adaptive Canonicalization with Application to Invariant Anisotropic Geometric Networks

Lin, Ya-Wei Eileen, Levie, Ron

arXiv.org Artificial Intelligence

Canonicalization is a widely used strategy in equivariant machine learning, enforcing symmetry in neural networks by mapping each input to a standard form. Yet, it often introduces discontinuities that can affect stability during training, limit generalization, and complicate universal approximation theorems. In this paper, we address this by introducing adaptive canonicalization, a general framework in which the canonicalization depends both on the input and the network. Specifically, we present the adaptive canonicalization based on prior maximization, where the standard form of the input is chosen to maximize the predictive confidence of the network. We prove that this construction yields continuous and symmetry-respecting models that admit universal approximation properties. We propose two applications of our setting: (i) resolving eigenbasis ambiguities in spectral graph neural networks, and (ii) handling rotational symmetries in point clouds. We empirically validate our methods on molecular and protein classification, as well as point cloud classification tasks. Our adaptive canonicalization outperforms the three other common solutions to equivariant machine learning: data augmentation, standard canonicalization, and equivariant architectures.


The Expressive Power of Neural Networks: A View from the Width

Neural Information Processing Systems

The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g.